AITopics | pronunciation dictionary

Collaborating Authors

pronunciation dictionary

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Development of a Comprehensive Spanish Dictionary for Phonetic and Lexical Tagging in Socio-phonetic Research (ESPADA)

Gonzalez, Simon

arXiv.org Artificial IntelligenceJul-22-2024

Pronunciation dictionaries are an important component in the process of speech forced alignment. The accuracy of these dictionaries has a strong effect on the aligned speech data since they help the mapping between orthographic transcriptions and acoustic signals. In this paper, I present the creation of a comprehensive pronunciation dictionary in Spanish (ESPADA) that can be used in most of the dialect variants of Spanish data. Current dictionaries focus on specific regional variants, but with the flexible nature of our tool, it can be readily applied to capture the most common phonetic differences across major dialectal variants. We propose improvements to current pronunciation dictionaries as well as mapping other relevant annotations such as morphological and lexical information. In terms of size, it is currently the most complete dictionary with more than 628,000 entries, representing words from 16 countries. All entries come with their corresponding pronunciations, morphological and lexical tagging, and other relevant information for phonetic analysis: stress patterns, phonotactics, IPA transcriptions, and more. This aims to equip socio-phonetic researchers with a complete open-source tool that enhances dialectal research within socio-phonetic frameworks in the Spanish language.

alignment, pronunciation dictionary, variant, (17 more...)

arXiv.org Artificial Intelligence

2407.15375

Country:

Europe > Austria > Vienna (0.14)
North America > Mexico > Campeche (0.04)
South America > Venezuela (0.04)
(21 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Large Vocabulary Spontaneous Speech Recognition for Tigrigna

Kahsu, Ataklti, Teferra, Solomon

arXiv.org Artificial IntelligenceOct-15-2023

This thesis proposes and describes a research attempt at designing and developing a speaker independent spontaneous automatic speech recognition system for Tigrigna The acoustic model of the Speech Recognition System is developed using Carnegie Mellon University Automatic Speech Recognition development tool (Sphinx) while the SRIM tool is used for the development of the language model. Keywords Automatic Speech Recognition Tigrigna language

recognition, recognizer, speech recognition, (11 more...)

arXiv.org Artificial Intelligence

2402.04254

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
North America > United States > New Jersey (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Add feedback

My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech

Pradhan, Sameer S., Cole, Ronald A., Ward, Wayne H.

arXiv.org Artificial IntelligenceSep-23-2023

This article describes the MyST corpus developed as part of the My Science Tutor project -- one of the largest collections of children's conversational speech comprising approximately 400 hours, spanning some 230K utterances across about 10.5K virtual tutor sessions by around 1.3K third, fourth and fifth grade students. 100K of all utterances have been transcribed thus far. The corpus is freely available (https://myst.cemantix.org) for non-commercial use using a creative commons license. It is also available for commercial use (https://boulderlearning.com/resources/myst-corpus/). To date, ten organizations have licensed the corpus for commercial use, and approximately 40 university and other not-for-profit research groups have downloaded the corpus. It is our hope that the corpus can be used to improve automatic speech recognition algorithms, build and evaluate conversational AI agents for education, and together help accelerate development of multimodal applications to improve children's excitement and learning about science, and help them learn remotely.

corpus, student, utterance, (17 more...)

arXiv.org Artificial Intelligence

2309.13347

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education > Primary School (0.55)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Ribeiro, Manuel Sam, Comini, Giulia, Lorenzo-Trueba, Jaime

arXiv.org Artificial IntelligenceJul-31-2023

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently improves the phone error rate of G2P systems across languages and amount of available data.

artificial intelligence, machine learning, pronunciation, (15 more...)

arXiv.org Artificial Intelligence

2307.16643

Genre: Research Report (0.82)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.65)

Add feedback

Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Do, Phat, Coler, Matt, Dijkstra, Jelske, Klabbers, Esther

arXiv.org Artificial IntelligenceJun-21-2023

We compare using a PHOIBLE-based phone mapping method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We use Character Error Rates from automatic speech recognition and predicted Mean Opinion Scores for evaluation. Results show that both phone mapping and features input improve the output quality and the latter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) with a family tree-based distance measure as a criterion to select source languages in transfer learning. ASPF proves effective if label-based phone input is used, while the language distance does not have expected effects.

artificial intelligence, machine learning, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2306.1204

Country:

Europe > Netherlands (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.83)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.52)

Add feedback

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech

Do, Phat, Coler, Matt, Dijkstra, Jelske, Klabbers, Esther

arXiv.org Artificial IntelligenceJun-1-2023

We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a) using a massively multilingual model to convert grapheme-to-phone (G2P) in both training and synthesizing, and b) using a universal phone recognizer to create a makeshift dictionary. Results show that the G2P approach performs largely on par with using a ground-truth dictionary and the phone recognition approach, while performing generally worse, remains a viable option for LRLs less suitable for the G2P approach. Within each approach, using articulatory features as input outperforms using phone labels.

articulatory feature, lrl, phone label, (16 more...)

arXiv.org Artificial Intelligence

2306.00535

Country:

Europe > Netherlands (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.63)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Wang, Lei, Tong, Rong, Leung, Cheung Chi, Sivadas, Sunil, Ni, Chongjia, Ma, Bin

arXiv.org Artificial IntelligenceOct-7-2022

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.

artificial intelligence, asr system, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICOT.2017.8336109

2210.0358

Country:

Asia > Singapore (0.05)
Oceania > Australia > Queensland > Brisbane (0.05)
Asia > Myanmar (0.05)
(14 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.42)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Multi-Module G2P Converter for Persian Focusing on Relations between Words

Rezaei, Mahdi, Nayeri, Negar, Farzi, Saeed, Sameti, Hossein

arXiv.org Artificial IntelligenceAug-2-2022

G2P systems aim to convert a grapheme (letter) sequence into its In this paper, we investigate the application of pronunciation sequence, and are an essential end-to-end and multi-module frameworks for component of text-to-speech (TTS) and speech G2P conversion for the Persian language. The recognition systems for any language lacking results demonstrate that our proposed multimodule consistent pronunciation rules. G2P system outperforms our end-to-end A good G2P system must address the issues of systems in terms of accuracy and speed. The out-of-vocabulary (OOV) words and cross-word system consists of a pronunciation dictionary as relations. OOV words are those which are not our look-up table, along with separate models to present in the lexicon, meaning they were not handle homographs, OOVs and ezafe in Persian seen during model training. In the case of G2P, created using GRU and Transformer the lexicon is a dictionary consisting of architectures. The system is sequence-level rather graphemes and their respective phonemes. As for than word-level, which allows it to effectively cross-word relations, a Persian G2P task is capture the unwritten relations between words mainly concerned with homographs and ezafe (cross-word information) necessary for constructions.

machine learning, natural language, pronunciation, (20 more...)

arXiv.org Artificial Intelligence

2208.01371

Country: Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Researchers teach a computer to compose sonnets like Shakespeare

EngadgetAug-10-2018, 14:10:47 GMT

In addition to penning 37 plays, William Shakespeare was a prolific composer of sonnets -- crafting 154 of them during his life. Now, more than 400 years after his death, the Bard's words are influencing a new generation of poets. It's just that these writers do so with silicon imaginations and digital quills. A consortium of researchers from the University of Toronto, the University of Melbourne and IBM's Australia division have managed to teach a neural network to craft sonnets just as the Bard did in the 16th century, using his own words to teach the machine. They published their results at the 2018 ACL conference, and you can play around with the network itself over at GitHub.

artificial intelligence, machine learning, shakespeare, (13 more...)

Engadget

Country:

North America > Canada > Ontario > Toronto (0.56)
Oceania > Australia (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback